home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
EnigmA Amiga Run 1997 February
/
EnigmA AMIGA RUN 15 (1997)(G.R. Edizioni)(IT)[!][issue 1997-02][PLANET CD V].iso
/
enigma
/
earcd
/
utility
/
utilfile
/
rw2ent15.lha
/
Raw2Ent.doc
< prev
next >
Wrap
Text File
|
1996-11-10
|
14KB
|
445 lines
Project: Raw2Ent
ARexx : Raw2Ent.rexx
Version: 1.4.1 (14.07.96)
Program: Raw2Ent
Version: 1.5 (10.11.96)
Author : Tamio Patrick Honma
Files : CWISENV
Raw2Ent
Raw2Ent.doc
Raw2Ent.rexx
Raw2Ent.rexx.old
CONTENTS:
1. INTRODUCTION
1.1. REQUIREMENTS
2. USAGE
2.1. Raw2Ent VER: 1.5 (10.11.96)
2.2. Raw2Ent.rexx VER: 1.4.1 (14.07.96)
2.3. CWISENV
2.4. ARGUMENT-PRIORITY
3. LIMITATIONS
4. INSTALLATION
5. EXAMPLES
6. BYE!
7. LAST COMMENT
8. BUG REPORTS
9. HISTORY
1. INTRODUCTION
Raw2Ent converts raw 8-Bit-ASCII-Text into 7-Bit-ASCII-Text with
entity-codes and reverse. The ASCII-Format is a standardized format for
information interchange, but it is only standardized seven-bit-wide, which
means that 128 codes are defined. One Byte consists of eight bits and can
represent 256 different bit combinations. Therefore the last 128 bit
combinations are defined for free use by any operation-system. The problem
is that accent-characters and other special characters are not standardized,
because they are defined in (guess where?! ;) ) the free part of ASCII by
the operation system developers.
The goal of the Wold Wide Web developers was that it could be used on every
important operation system. So it was clear that the ASCII-Based
HTML-Source-Code had to use the standardized seven-bit area of the
ASCII-Code. To represent accent-characters or other special characters in a
seven-bit-code, it was neccesary to invent something. And this was the
entity-code - a kind of escape-code. An entity-code consists of an
introducing "&" and a ";" at the end. Between these symbols is a
character-name the browser can interpret. It is a very hard and stupid work
to convert the ASCII-Text by hand. So just use Raw2Ent!
Raw2Ent produces real 7-Bit-ASCII-Code. All printable Amiga-characters in
the 8-bit-area will be converted into entity-codes, without any exception.
The use of names instead of code-numbers will make the entity-codes easier
to be read by humans.
1.1. REQUIREMENTS
- AmigaOS 2.0 or greater
- optional: ARexx
2. USAGE
Raw2Ent consists of three parts: one assembler-program, one ARexx-Script
and one Batch-File.
If you just want to convert a text once, you just need the
assembler-program. If you want to convert one text more than one time
because you work on a project, like a web-page with actual information, the
ARexx-Script may be useful.
2.1. Raw2Ent VER: 1.5 (10.11.96)
arguments:
FROM/A - The source-file (eight bit wide)
TO/A - The destination-file (with entity-codes)
[path without filename is not accepted]
TAG/S - activates the TAG-Mode
HTML/S - activates the HTML-Mode
ENT/S - default mode
UML=NOENT/S - removes high-bit characters by characters or words
CODE/S - converts all entity-codes by code-number
(except the four special entities)
TOTALCODE/S - converts ALL characters by entity-code-numbers
SMART/S - activates the smart-mode
INVERSE=ENT2RAW/S - inverses the function of Raw2Ent to Ent2Raw
modes:
>TAG-Mode<
will not convert the four characters: & < > ". This is usefull for
ASCII-Text which already contains entity-codes or HTML-TAGS, which are
introduced and ended by "<" and ">" and which can contain quotes. The
"&"-character usually introduces the entity-codes. If you use the TAG-Mode
the entity-codes in the source-file will not be converted a second time in a
wrong way, but untouched special-characters will be converted. Therefore
you should use this mode, whenever you convert a text a second time.
>HTML-Mode<
will just copy the source-file to the destination-file. This feature is
implemented to make the program be easier used in script-files. (see i.e.
Raw2Ent.rexx)
>ENT-Mode<
is the default mode and converts every known character into its entity-code.
>NOENT-Mode< or >UML-Mode<
is a mode, which replaces each high-bit character by characters in the
low-bit area without using character-entity-codes. I.e.: "ü" will be
converted to "ue" and "£" will be converted to "pound" and "©" will be
converted to "(C)" and so on. It is recommended to use this argument in
accompany with the "TAG"-argument. Optionally you can use >UML< (like:
Umlaute), which is a synomnym to >NOENT<. [This mode was inspired by
Andreas Bais]
>CODE-Mode<
will convert all entity-codes not by the entity-names, but by the
entity-code-numbers. This may be usefull, if a browser doesn't support all
entity-names, but the numbers. Note: On the one hand code-numbers are hard
to be read by humans, but on the other hand the destination-file may be
shorter. [The shortest files may be converted in the "TAG NOENT"-Mode.]
>TOTALCODE-Mode< or >TOTCODE-Mode<
will convert EVERY character by its code-number. The only use of this mode
is to make it hard for humans to read the text-file. This text will only be
displayed readable by a HTML-Browser. This mode will surely produce the
largest destination-files!
>SMART-Mode<
is a combination of the >ENT<-Mode and the >TAG<-Mode. HTML-Files for
example will be converted without destruction of HTML-Tags and
character-entity-codes - like the >TAG<-Mode. The difference is that the
characters: < > & " will be converted, if Raw2Ent "thinks" that this
characters are no elements of the character-entity-codes or HTML-Tags. This
works the best, if the HTML-File contains "good" code. I cannot guarantee a
correct interpreatation by Raw2Ent, but I think it works with 95% of
HTML-Files without mistakes.
>INVERSE-Mode< or >ENT2RAW-Mode<
converts all character-entity-codes (names or numbers) into the Amiga-ASCII
- the Latin-1 standard. So you can use Raw2Ent as a Ent2Raw. If you set
the >TAG<-Mode, this Mode will not touch the codes: > < & and
".
return-codes:
$RC=0 (OK) -> everything's fine!
$RC=5 (WARN) -> wrong usage
$RC=10 (ERROR) -> error (memory)
$RC=20 (FAIL) -> failure (input/output)
2.2. Raw2Ent.rexx VER: 1.4.1 (14.07.96)
arguments:
FROM/A - The source-file (eight bit wide)
TO/A - The destination-file (with entity-codes)
[path without filename is not accepted]
TAG/S - activates the TAG-Mode
HTML/S - activates the HTML-Mode
ENT/S - default mode
modes:
see Raw2Ent-Usage (2.1.)
Raw2Ent.rexx manages:
- destination-path without filename
- suffix-handling (destinationfile has the suffix ".ent")
- progress-display
- enviroment-variables for automation without mistakes
2.3. CWISENV
Use CWISENV to set the enviroment-variables for a specific file, if you
don't want to use the Raw2Ent.rexx-arguments - i.e. in batch-files.
arguments:
FILENAME/A - is the filename of the input-file for Raw2Ent. NOTE: Don't
use pathnames! You must use the "cd"-command with the
pathname, where your textfile (input) can be found.
MODE/A - selects the Mode. You can choose: ENT, TAG, HTML
modes:
see Raw2Ent-Usage (2.1.)
2.4. ARGUMENT-PRIORITY
highest priority FROM/A - Must be used!
^ TO/A - Must be used!
|
| INVERSE/S - disables ALL other switches except "TAG"
| HTML/S - disables ALL the rest of the switches
| TOTALCODE/S - disables ALL the rest of the switches
| CODE/S - disables the switches: "NOENT" and "ENT"
| NOENT/S - disables the switches: "ENT" and "SMART"
| TAG/S - disables the switches: "ENT" and "SMART"
v SMART/S - disables the switch : "ENT"
lowest priority ENT/S - doesn't affect other switches
3. LIMITATIONS
- A text can only be converted in one part. No markup possible.
- The enviroment-variables have no information about paths.
So i.e. all "index.html"-files have the same variable.
The variables are ignored, if you set the "TAG"- or "HTML"- or "ENT"-Switch
on your own. Please note that only the ARexx-Script supports variables -
NOT the assembler-program!
- No convertion-progress-display implemented. If you want to have this, you
can use the old ARexx-Script Raw2Ent.rexx v1.3.
- The old ARexx-Script [1.3] is not compatible to the assembler program.
- Raw2Ent cannot be stopped by the break signals.
- Raw2Ent just supports the Amiga-8-Bit-ASCII [ISO-8859-1], but this is okay
because there are many ways to convert texts from PC, Mac, Unix and other
systems:
- Use the CrossDOS-Commodity. (This is the best way - because every
AmigaOS 2.1+ User has this program in the Tools/Commodities-directory!)
- There are many converters in AmiNet or other Freeware-Sources. I.e.:
- "CharConv" by Johan Billing [v1.6 from 1994]
- If you search for a good HTML to TXT converter, then try this one:
- "HTTX" by Gabriele Favrin [v1.0 from 1996]
4. INSTALLATION
4.1. Copy "Raw2Ent" and "Raw2Ent.rexx" into the same directory.
4.2. Change the path-name in the "Raw2Ent.rexx"-Script.
4.3. makedir ENVARC:cwis/
5. EXAMPLES
5.1. "Raw2Ent Text.html Text.ent TAG"
Converts the file "Text.html" into the file "Text.ent" by entity-codes
without destruction of HTML-Tags and already converted Entity-Codes.
5.2. "Raw2Ent Text Text.ent" or
"Raw2Ent Text Text.ent ENT"
Converts the file "Text" into "Text.ent" without having regard for HTML-Tags
or already converted Entity-Codes.
5.3. "Raw2Ent Text.html Text.ent HTML"
Just copies the file "Text.html" to "Text.ent".
5.4. "Raw2Ent Text.html Text.ent TAG CODE"
Same as example 5.1. but all character-entity-codes are represented by
code-numbers and no code-names.
5.5. "Raw2Ent Text Text.ent CODE"
Same as example 5.2. but all character-entity-codes are represented by
code-numbers, except the four characters: < > " ?, which are still
represented by their character-entity-names.
5.6. "Raw2Ent Text Text.uml TAG NOENT" or
"Raw2Ent Text Text.uml TAG UML"
This will convert raw text into seven-bit-wide-text without the use of
character-entity-codes. Raw2Ent will convert the text with alternative
characters or words or shortcuts. The destination-file in this example will
contain not a single character-entity-code, because of the use of the
>TAG<-argument.
5.7. "Raw2Ent Text Text.code TOTALCODE"
The file "Text.code" will contain just code-number-entities, without any
exception. The destination file will be very large! Note: The TAG-Mode will
be ignored, if you use this mode. Therefore you will not be able to use this
mode for HTML-Files, but you can only read this file with a HTML-Browser.
5.8. "Raw2Ent Text.ent Text INVERSE" or
"Raw2Ent Text.ent Text ENT2RAW"
This will convert the file "Text.ent" to the Amiga-ASCII-File "Text".
5.9. "Raw2Ent Text.ent Text TAG INVERSE" or
"Raw2Ent Text.ent Text TAG ENT2RAW"
This is the same as 5.8.. The only difference is, that the codes: >
< & and "e; will not be converted.
6. BYE!
----------------------------------------------------------------------------
THE AUTHOR IS NOT RESPONSIBLE FOR ANY LOSS OF DATA OR DAMAGE!
USE THIS FREEWARE-PROGRAM ON YOUR OWN RISK.
----------------------------------------------------------------------------
Send comments to:
Tamio Patrick Honma
eMail: honma@thepentagon.com
WWW: http://www.netforward.com/thepentagon/?honma
P.S. This program was made for the CWIS-Script-System on Amiga and PC.
The CWIS of the Heinrich-Heine-Universität Düsseldorf can be found here:
http://www.phil-fak.uni-duesseldorf.de/cwis/
7. LAST COMMENT:
Hey!? What do you want!? I'm just a sociology-, education- and
information-science-student and no programmer-student!
Just send me your bug-reports, ... ;)
---> Raw2Ent-Assembler-Program <---
8. BUG REPORTS:
Reported by Bug fixed Version
Joakim Andersson Entity-Codes Å and å missing 1.0.1 (04.09.96)
Tamio Honma Last Byte in destination-file has been 1.1 (05.09.96)
deleted
Tamio Honma Version-String missed one space-char 1.1.1 (06.09.96)
Marcus Beranek Entity-Codes to ÿ,ï,Ï,æ,Æ,ø,Ø,ë,Ë missing 1.1.3 (06.09.96)
Tamio Honma Mistake in circumflex-accents 1.1.4 (08.09.96)
9. HISTORY:
Version Feature Date
1.0 first release 13.07.06 (*)
1.1 Improved the speed approx 20 (!!!) times! 05.09.96
Version-String added
1.1.2 Optimised code (speed increased) 06.09.96
Return-Codes added (see paragraph 2.1 for return-codes)
1.1.3 HTML-Tags to °,¹,²,³ added (faked) 06.09.96
1.1.4 All Entity-Codes available on Amiga-Charset included! 08.09.96 (*)14
Entity-Codes refering to the HTML 3.2 standard
All available names included
Codes without names represented by code-number
faking HTML-Tags removed
1.1.5 Just changed a code-number by entity-name "§" 14.10.96
1.1.6 changed all code-numbers to entity-names (completely!) 30.10.96
1.2 added CODE-Mode 01.11.96 (*)
1.3 added NOENT-Mode and TOTALCODE-Mode 03.11.96 (*)
Version-String will be displayed in the usage-text
1.4 added SMART-Mode 06-07.11.96 (*)
1.5 added INVERSE-Mode for names and codes (Ent2Raw) 07-10.11.96 (*)
added HELP-Text
(*) = released in Aminet
number represents the number of the Aminet CD-ROM